import pandas as pdGE Aviation - Remaining Useful Life Analysis
Part 4 - Model Building
Read the Data
df = pd.read_csv("D:\School\FL 2022\ISA 401\GE\ge_data.csv")
df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 36 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 dataset 100 non-null object
1 esn 100 non-null int64
2 unit 100 non-null int64
3 operator 100 non-null object
4 last_flight_cycle 100 non-null int64
5 last_datetime 100 non-null object
6 mean_tra 100 non-null int64
7 mean_t2 100 non-null float64
8 mean_t24 100 non-null float64
9 mean_t30 100 non-null float64
10 mean_t50 100 non-null float64
11 mean_p2 100 non-null float64
12 mean_p15 100 non-null float64
13 mean_p30 100 non-null float64
14 mean_nf 100 non-null float64
15 mean_nc 100 non-null float64
16 mean_epr 100 non-null float64
17 mean_ps30 100 non-null float64
18 mean_phi 100 non-null float64
19 mean_nrf 100 non-null float64
20 mean_nrc 100 non-null float64
21 mean_bpr 100 non-null float64
22 mean_farb 100 non-null float64
23 mean_htbleed 100 non-null float64
24 mean_nf_dmd 100 non-null int64
25 mean_pcnfr_dmd 100 non-null int64
26 mean_w31 100 non-null float64
27 mean_w32 100 non-null float64
28 mean_X44321P02_op016 100 non-null float64
29 mean_X44321P02_op420 100 non-null float64
30 mean_X54321P01_op116 100 non-null float64
31 mean_X54321P01_op220 100 non-null float64
32 mean_X65421P11_op232 100 non-null float64
33 mean_X65421P11_op630 100 non-null float64
34 total_distance 100 non-null float64
35 rul 100 non-null int64
dtypes: float64(26), int64(7), object(3)
memory usage: 28.2+ KB
Drop unnecessary variables
Unnecessary variables are dropped as explained in the previous part:
vars_to_drop = ['dataset','esn', 'unit', 'last_datetime','mean_tra','mean_t2','mean_p2',
'mean_epr','mean_farb','mean_nf_dmd', 'mean_pcnfr_dmd', 'mean_p15', 'mean_t24']
df.drop(vars_to_drop, axis = 1, inplace = True)df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 operator 100 non-null object
1 last_flight_cycle 100 non-null int64
2 mean_t30 100 non-null float64
3 mean_t50 100 non-null float64
4 mean_p30 100 non-null float64
5 mean_nf 100 non-null float64
6 mean_nc 100 non-null float64
7 mean_ps30 100 non-null float64
8 mean_phi 100 non-null float64
9 mean_nrf 100 non-null float64
10 mean_nrc 100 non-null float64
11 mean_bpr 100 non-null float64
12 mean_htbleed 100 non-null float64
13 mean_w31 100 non-null float64
14 mean_w32 100 non-null float64
15 mean_X44321P02_op016 100 non-null float64
16 mean_X44321P02_op420 100 non-null float64
17 mean_X54321P01_op116 100 non-null float64
18 mean_X54321P01_op220 100 non-null float64
19 mean_X65421P11_op232 100 non-null float64
20 mean_X65421P11_op630 100 non-null float64
21 total_distance 100 non-null float64
22 rul 100 non-null int64
dtypes: float64(20), int64(2), object(1)
memory usage: 18.1+ KB
df.drop('rul', axis = 1).columnsBuild Model
As mentioned in previous parts, the goal is to create a regression model. In order to accomplish this, I used PyCaret to automate the model building process.
from pycaret.regression import *s = setup(df, target='rul', train_size = 0.9, session_id=123, remove_multicollinearity=True, multicollinearity_threshold=0.8, polynomial_features=True, feature_interaction=True, fold = 5)| Description | Value | |
|---|---|---|
| 0 | session_id | 123 |
| 1 | Target | rul |
| 2 | Original Data | (100, 23) |
| 3 | Missing Values | False |
| 4 | Numeric Features | 21 |
| 5 | Categorical Features | 1 |
| 6 | Ordinal Features | False |
| 7 | High Cardinality Features | False |
| 8 | High Cardinality Method | None |
| 9 | Transformed Train Set | (90, 17) |
| 10 | Transformed Test Set | (10, 17) |
| 11 | Shuffle Train-Test | True |
| 12 | Stratify Train-Test | False |
| 13 | Fold Generator | KFold |
| 14 | Fold Number | 5 |
| 15 | CPU Jobs | -1 |
| 16 | Use GPU | False |
| 17 | Log Experiment | False |
| 18 | Experiment Name | reg-default-name |
| 19 | USI | 9f1e |
| 20 | Imputation Type | simple |
| 21 | Iterative Imputation Iteration | None |
| 22 | Numeric Imputer | mean |
| 23 | Iterative Imputation Numeric Model | None |
| 24 | Categorical Imputer | constant |
| 25 | Iterative Imputation Categorical Model | None |
| 26 | Unknown Categoricals Handling | least_frequent |
| 27 | Normalize | False |
| 28 | Normalize Method | None |
| 29 | Transformation | False |
| 30 | Transformation Method | None |
| 31 | PCA | False |
| 32 | PCA Method | None |
| 33 | PCA Components | None |
| 34 | Ignore Low Variance | False |
| 35 | Combine Rare Levels | False |
| 36 | Rare Level Threshold | None |
| 37 | Numeric Binning | False |
| 38 | Remove Outliers | False |
| 39 | Outliers Threshold | None |
| 40 | Remove Multicollinearity | True |
| 41 | Multicollinearity Threshold | 0.800000 |
| 42 | Remove Perfect Collinearity | True |
| 43 | Clustering | False |
| 44 | Clustering Iteration | None |
| 45 | Polynomial Features | True |
| 46 | Polynomial Degree | 2 |
| 47 | Trignometry Features | False |
| 48 | Polynomial Threshold | 0.100000 |
| 49 | Group Features | False |
| 50 | Feature Selection | False |
| 51 | Feature Selection Method | classic |
| 52 | Features Selection Threshold | None |
| 53 | Feature Interaction | True |
| 54 | Feature Ratio | False |
| 55 | Interaction Threshold | 0.010000 |
| 56 | Transform Target | False |
| 57 | Transform Target Method | box-cox |
Given that there were only 100 observations, the following models were considered:
Linear Regression
Lasso Regression
Ridge Regression
Elastic Net
Least Angle Regression
Lasso Least Angle Regression
best = compare_models(include=['lr', 'lasso', 'ridge','en', 'lar', 'llar'])| Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | TT (Sec) | |
|---|---|---|---|---|---|---|---|---|
| lasso | Lasso Regression | 29.4194 | 1446.4202 | 37.4531 | 0.4394 | 0.6341 | 0.7641 | 0.4380 |
| ridge | Ridge Regression | 29.6700 | 1467.4624 | 37.7583 | 0.4320 | 0.6414 | 0.7578 | 0.0060 |
| en | Elastic Net | 29.9627 | 1468.4204 | 37.7039 | 0.4301 | 0.6344 | 0.7736 | 0.0080 |
| lr | Linear Regression | 29.8843 | 1479.4645 | 37.9202 | 0.4274 | 0.6583 | 0.7648 | 1.0860 |
| llar | Lasso Least Angle Regression | 34.7031 | 1716.7393 | 41.1517 | 0.3576 | 0.7857 | 1.0526 | 0.0060 |
| lar | Least Angle Regression | 214.7651 | 133898.8908 | 262.3568 | -41.2500 | 1.3428 | 7.2594 | 0.0080 |
model = create_model('lasso')| MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
|---|---|---|---|---|---|---|
| Fold | ||||||
| 0 | 30.5005 | 1306.2978 | 36.1427 | 0.6634 | 0.6453 | 0.8018 |
| 1 | 26.7571 | 1226.8475 | 35.0264 | 0.5233 | 0.6532 | 0.8463 |
| 2 | 28.2635 | 1168.3029 | 34.1804 | 0.4297 | 0.4068 | 0.3275 |
| 3 | 23.6148 | 997.8572 | 31.5889 | 0.6096 | 0.6542 | 0.8065 |
| 4 | 37.9614 | 2532.7959 | 50.3269 | -0.0291 | 0.8111 | 1.0384 |
| Mean | 29.4194 | 1446.4202 | 37.4531 | 0.4394 | 0.6341 | 0.7641 |
| Std | 4.8219 | 552.5607 | 6.6097 | 0.2473 | 0.1295 | 0.2349 |
model = tune_model(model)| MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
|---|---|---|---|---|---|---|
| Fold | ||||||
| 0 | 32.7466 | 1366.0072 | 36.9595 | 0.6480 | 0.6996 | 0.9206 |
| 1 | 29.0010 | 1413.4174 | 37.5954 | 0.4509 | 0.6608 | 0.8726 |
| 2 | 28.5340 | 1232.8230 | 35.1116 | 0.3982 | 0.4609 | 0.3293 |
| 3 | 22.5737 | 999.2052 | 31.6102 | 0.6091 | 0.6502 | 0.8043 |
| 4 | 39.8887 | 2599.1195 | 50.9816 | -0.0560 | 0.7829 | 1.0479 |
| Mean | 30.5488 | 1522.1145 | 38.4517 | 0.4100 | 0.6509 | 0.7949 |
| Std | 5.6942 | 557.3595 | 6.6018 | 0.2511 | 0.1058 | 0.2461 |
evaluate_model(model)predict_model(model) ## predict on the holdout set| Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
|---|---|---|---|---|---|---|---|
| 0 | Lasso Regression | 18.3184 | 451.2826 | 21.2434 | 0.7812 | 0.4577 | 0.4554 |
| last_flight_cycle | mean_nc | mean_htbleed | mean_X44321P02_op016 | mean_X44321P02_op420 | mean_X54321P01_op116 | mean_X54321P01_op220 | mean_X65421P11_op232 | mean_X65421P11_op630 | operator_AIC | operator_AXM | operator_FRON | operator_PGT | mean_X65421P11_op630_multiply_mean_X44321P02_op420 | mean_X54321P01_op116_multiply_last_flight_cycle | mean_htbleed_multiply_mean_nc | mean_nc_multiply_last_flight_cycle | rul | Label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 55.0 | 9050.980469 | 393.054535 | 24.049202 | 14.134301 | 30.032610 | 22.774393 | 239.792557 | 287.508270 | 0.0 | 0.0 | 1.0 | 0.0 | 4063.728516 | 1651.793579 | 3557529.00 | 4.978039e+05 | 123 | 107.292969 |
| 1 | 68.0 | 9066.904297 | 392.000000 | 17.068485 | 12.682482 | 28.912149 | 25.639204 | 186.075317 | 183.187302 | 0.0 | 0.0 | 1.0 | 0.0 | 2323.269531 | 1966.026123 | 3554226.50 | 6.165495e+05 | 130 | 118.529297 |
| 2 | 73.0 | 9057.731445 | 392.287659 | 8.047889 | 10.678687 | 33.833588 | 25.187513 | 240.154816 | 153.209244 | 0.0 | 0.0 | 0.0 | 1.0 | 1636.073608 | 2469.851807 | 3553236.25 | 6.612144e+05 | 122 | 120.369141 |
| 3 | 171.0 | 9059.825195 | 392.292389 | 21.018927 | 14.269072 | 22.455219 | 27.310680 | 188.682510 | 226.495361 | 0.0 | 0.0 | 1.0 | 0.0 | 3231.878418 | 3839.842529 | 3554100.50 | 1.549230e+06 | 127 | 87.818359 |
| 4 | 168.0 | 9057.583008 | 392.773804 | 17.294361 | 12.646115 | 33.331760 | 27.172123 | 229.029007 | 142.352966 | 0.0 | 1.0 | 0.0 | 0.0 | 1800.212036 | 5599.735840 | 3557581.25 | 1.521674e+06 | 28 | 38.929688 |
| 5 | 31.0 | 9049.054688 | 391.741943 | 16.257576 | 13.692347 | 23.372644 | 19.509785 | 220.605606 | 172.852112 | 0.0 | 0.0 | 0.0 | 1.0 | 2366.750977 | 724.552002 | 3544894.25 | 2.805207e+05 | 149 | 172.435547 |
| 6 | 105.0 | 9052.970703 | 392.838104 | 24.065445 | 9.905726 | 33.374115 | 17.178091 | 237.405304 | 128.142868 | 0.0 | 1.0 | 0.0 | 0.0 | 1269.348145 | 3504.281982 | 3556351.75 | 9.505619e+05 | 78 | 64.326172 |
| 7 | 144.0 | 9049.362305 | 392.375000 | 19.759495 | 12.905885 | 27.264637 | 22.027479 | 183.662613 | 214.633728 | 0.0 | 0.0 | 0.0 | 1.0 | 2770.038086 | 3926.107666 | 3550743.50 | 1.303108e+06 | 134 | 103.671875 |
| 8 | 162.0 | 9061.944336 | 392.864197 | 24.589872 | 14.170225 | 33.510212 | 19.755394 | 160.909058 | 223.617538 | 0.0 | 0.0 | 0.0 | 1.0 | 3168.710938 | 5428.654297 | 3560113.50 | 1.468035e+06 | 9 | 35.873047 |
| 9 | 98.0 | 9052.542969 | 392.938782 | 24.278660 | 12.311980 | 21.847729 | 26.233198 | 125.907394 | 178.445847 | 0.0 | 0.0 | 1.0 | 0.0 | 2197.021729 | 2141.077393 | 3557095.25 | 8.871492e+05 | 123 | 113.046875 |
final_model = finalize_model(model)final_modelLasso(alpha=7.73, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute=False, random_state=123,
selection='cyclic', tol=0.0001, warm_start=False)
Save the Model
save_model(final_model, 'model')Transformation Pipeline and Model Successfully Saved
(Pipeline(memory=None,
steps=[('dtypes',
DataTypes_Auto_infer(categorical_features=[],
display_types=True, features_todrop=[],
id_columns=[], ml_usecase='regression',
numerical_features=[], target='rul',
time_features=[])),
('imputer',
Simple_Imputer(categorical_strategy='not_available',
fill_value_categorical=None,
fill_value_numerical=None,
numeric_strategy='me...
DFS_Classic(interactions=['multiply'], ml_usecase='regression',
n_jobs=-1, random_state=123, subclass='binary',
target='rul',
top_features_to_pick_percentage=None)),
('pca', 'passthrough'),
['trained_model',
Lasso(alpha=7.73, copy_X=True, fit_intercept=True,
max_iter=1000, normalize=False, positive=False,
precompute=False, random_state=123, selection='cyclic',
tol=0.0001, warm_start=False)]],
verbose=False),
'model.pkl')